6 - MLPDES25: Optimal Control and Reinforcement Learning [ID:57439]

50 von 229 angezeigt

Okay, so today I would like to try to bridge two topics that are, I guess, quite well related,

but maybe see from some historical reasons as distinct, which are optimal control and

reinforcement learning. Okay, so I guess manual view is much more, well, acquainted with optimal

control. So I will try to explain very briefly at the beginning of my talk what I mean by

reinforcement learning, what it is reinforcement learning, and then I try to make the bridge.

Okay, so the goal of reinforcement learning is basically this one. So suppose that you are an

agent that is here is pictured like a brain and this brain has to move or make decisions in an

environment that is unknown or just partially known, and from the environment the agent gets

back observations and a reward that can be a positive or negative reward. Examples like if

you have a dog that you are trying to train or you're trying to educate somehow, you want him

to do something, so you give a positive reward when it does something in the way you want,

you give a negative reward when he does something that in the way you don't want to do. Okay,

so reinforcement learning is one of the three paradigms of machine learning, supervised learning,

unsupervised learning, and reinforcement learning. So this is basically the third one and has achieved

some notable successes in the last, I would say, 15 years. One of them, I don't know if you play

chess or if you play to any of these online games, now they all have these artificial intelligence

programs like Stockfish or things like that. These are all based on reinforcement learning

techniques. The first one of this kind was AlphaGo and the first one was AlphaChess,

but there were also others before, but AlphaGo was extremely successful because Go is somehow

even more complicated game than chess, had a huge number of legal states, and basically AlphaGo was

the first artificial intelligence able to beat a human champ in this game. And this happened in

2013. This was published by David Silver and co-editors on Nature, and it was a huge success

because this happens years ahead of schedule, and I mean that the prediction was that maybe

this could have happened in 2016, instead it happened in 2013, actually in 2012, because the

prediction was based on the power of computer power computations, and the fact that with a more

reasonable amount of power computation this was achieved was a great success. Okay, but the link

between reinforcement learning and optimal control actually was very well known since the early stage

of reinforcement learning. If you try to learn something on reinforcement learning, I guess you

will crash into the monograph by Satton and Bartow, and here you can find a CDC paper of I think 1992,

in which basically co-authored by Satton, Bartow, and Williams, in which basically the title says

everything. Reinforcement learning is direct adaptive optimal control. What does it mean,

a direct adaptive optimal control? Well, basically reinforcement learning is nothing else than trying

to, let's say, learn the policy that the agent has to pursue without learning the model behind.

That's basically the main goal of reinforcement learning, and in this sense is a direct optimal

control. Now, just to give you the flavor of how a reinforcement learning algorithm is, let me show

you for instance one of the simplest reinforcement learning algorithms that is called Q-learning.

This I think was published in 1989 by Watkins in his PhD thesis, and basically now I'm using

notations that are maybe a little bit more familiar for mathematicians, but basically,

suppose that you just give a reward, an initial state, and an initialization of the so-called

action value function. Suppose that the agent moves according to a certain policy. This policy

can be given by a greedy policy like this one, or epsilon greedy, meaning that it's greedy with

a certain probability, or random with another one. It could be also purely random.

Here Q is the action value function, meaning that it's a function that when you maximize with respect

to U is a value function. That's what they do to build the Q. They start by initialization.

They say Q is...

No, the model is not given. What you have is just...

You assume that you just say, okay, I assume that you can observe at each stage, you can observe

where you are. You assume that if I am here, then if I move like this, I'm here, and now if I move

like this, I'm here. That's the only thing that you assume. Basically, you assume that at the

beginning you have a table of states and actions, and on each entry of this table you have the value

Teil einer Videoserie :

MLPDES25 • Machine Learning and PDEs Workshop

Presenters

Prof. Michele Palladino

Zugänglich über

Offener Zugang

Dauer

00:35:01 Min

Aufnahmedatum

2025-04-28

Hochgeladen am

2025-04-30 09:37:20

Sprache

en-US

https://mod.fau.eu/mlpdes25/

#MLPDES25 Machine Learning and PDEs Workshop

Mon. – Wed. April 28 – 30, 2025

HOST: FAU MoD, Research Center for Mathematics of Data at FAU, Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen – Bavaria (Germany)

https://mod.fau.eu/mlpdes25/

SPEAKERS

• Paola Antonietti. Politecnico di Milano
• Alessandro Coclite. Politecnico di Bari
• Fariba Fahroo. Air Force Office of Scientific Research
• Giovanni Fantuzzi. FAU MoD/DCN-AvH, Friedrich-Alexander-Universität Erlangen-Nürnberg
• Borjan Geshkovski. Inria, Sorbonne Université
• Paola Goatin. Inria, Sophia-Antipolis
• Shi Jin. SJTU, Shanghai Jiao Tong University
• Alexander Keimer. Universität Rostock
• Felix J. Knutson. Air Force Office of Scientific Research
• Anne Koelewijn. FAU MoD, Friedrich-Alexander-Universität Erlangen-Nürnberg
• Günter Leugering. FAU, Friedrich-Alexander-Universität Erlangen-Nürnberg
• Lorenzo Liverani. FAU, Friedrich-Alexander-Universität Erlangen-Nürnberg
• Camilla Nobili. University of Surrey
• Gianluca Orlando. Politecnico di Bari
• Michele Palladino. Università degli Studi dell’Aquila
• Gabriel Peyré. CNRS, ENS-PSL
• Alessio Porretta. Università di Roma Tor Vergata
• Francesco Regazzoni. Politecnico di Milano
• Domènec Ruiz-Balet. Université Paris Dauphine
• Daniel Tenbrinck. FAU, Friedrich-Alexander-Universität Erlangen-Nürnberg
• Daniela Tonon. Università di Padova
• Juncheng Wei. Chinese University of Hong Kong
• Yaoyu Zhang. Shanghai Jiao Tong University
• Wei Zhu. Georgia Institute of Technology

SCIENTIFIC COMMITTEE

• Giuseppe Maria Coclite. Politecnico di Bari

• Enrique Zuazua. FAU MoD/DCN-AvH, Friedrich-Alexander-Universität Erlangen-Nürnberg

ORGANIZING COMMITTEE

• Darlis Bracho Tudares. FAU MoD/DCN-AvH, Friedrich-Alexander-Universität Erlangen-Nürnberg

• Nicola De Nitti. Università di Pisa

• Lorenzo Liverani. FAU DCN-AvH, Friedrich-Alexander-Universität Erlangen-Nürnberg

SEE MORE: https://mod.fau.eu/mlpdes25/

Video teaser of the #MLPDES25 Workshop: https://youtu.be/4sJPBkXYw3M

#FAU #FAUMoD #MLPDES25 #workshop #erlangen #bavaria #germany #deutschland #mathematics #research #machinelearning #neuralnetworks

Tags

Per RSS abonnieren